Skip to content

refactor: remove build() and async shuffling calculation#8688

Merged
matthewkeil merged 32 commits intoChainSafe:unstablefrom
guha-rahul:refactor_shiffling_cache
Jan 22, 2026
Merged

refactor: remove build() and async shuffling calculation#8688
matthewkeil merged 32 commits intoChainSafe:unstablefrom
guha-rahul:refactor_shiffling_cache

Conversation

@guha-rahul
Copy link
Copy Markdown
Contributor

@guha-rahul guha-rahul commented Dec 10, 2025

Motivation

  • When lodestar-z happens, BeaconStateAllForks will be a blocker and The build() method in ShufflingCache depends on it.
  • Post-Fulu proposer lookahead is stored in BeaconState, requiring shufflings synchronously during epoch transitions—making the async build() pattern no longer viable.

Description

  • Remove build() method from IShufflingCache interface and ShufflingCache class
  • Add set() to IShufflingCache interface to add shufflings
  • Remove asyncShufflingCalculation

Closes #8653

AI Assistance Disclosure

  • External Contributors: I have read the contributor guidelines and disclosed my usage of AI below.
    use claude to understand how ShufflingCache avoids recomputation

@guha-rahul guha-rahul requested a review from a team as a code owner December 10, 2025 22:50
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @guha-rahul, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the shuffling calculation mechanism by removing the asynchronous build() method and the asyncShufflingCalculation flag. This change is motivated by upcoming lodestar-z compatibility issues with BeaconStateAllForks and the need for synchronous shuffling calculations for proposer lookahead in the post-Fulu era. The IShufflingCache interface has been updated to reflect these changes, now providing a set() method for directly adding shufflings.

Highlights

  • Removal of Asynchronous Shuffling: The build() method and the asyncShufflingCalculation flag have been removed, shifting shuffling computations to a synchronous model to align with new requirements.
  • IShufflingCache Interface Update: The build() method in the IShufflingCache interface has been replaced with a set() method, allowing direct addition of EpochShuffling objects to the cache.
  • BeaconStateAllForks Compatibility: The changes address a future blocker related to BeaconStateAllForks and lodestar-z by removing the dependency of the build() method on it.
  • Post-Fulu Requirements: This refactor ensures that proposer lookahead, which requires synchronous shuffling post-Fulu, is correctly handled within the system.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the shuffling calculation by removing the asynchronous build() method and the asyncShufflingCalculation option. This change simplifies the codebase by enforcing synchronous shuffling calculation, which is a necessary step for upcoming features like post-Fulu proposer lookahead. The changes are consistent across all modified files, including interfaces, implementations, and tests. The IShufflingCache interface is updated to use a set() method, and the implementation in ShufflingCache correctly makes this method public. The epoch processing logic in EpochCache and EpochTransitionCache is also cleaned up by removing the now-obsolete async path. Overall, this is a solid refactoring. I have a couple of minor suggestions to improve code clarity and documentation.

@@ -380,7 +370,6 @@ export function beforeProcessEpoch(
});

// Trigger async build of shuffling for epoch after next (nextShuffling post epoch transition)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is misleading as the asynchronous shuffling build logic has been removed in this pull request. It should be removed to avoid confusion for future developers.

Comment on lines 54 to 56
/**
* Queue asynchronous build for an EpochShuffling
* Add an EpochShuffling to the ShufflingCache.
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for the set method in the IShufflingCache interface could be more descriptive. The implementation in ShufflingCache also handles resolving promises if they exist. Reflecting this behavior in the interface documentation would improve clarity for developers using this interface.

Suggested change
/**
* Queue asynchronous build for an EpochShuffling
* Add an EpochShuffling to the ShufflingCache.
*/
/**
* Add an EpochShuffling to the ShufflingCache. If a promise for the shuffling is present it will
* resolve the promise with the built shuffling.
*/

wemeetagain
wemeetagain previously approved these changes Dec 11, 2025
Copy link
Copy Markdown
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally lgtm

Copy link
Copy Markdown
Contributor

@twoeths twoeths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR misses the context of #6938:

in the end it'd be more about reverting #6938, and we should deploy on a feature group to make sure there is no regression
also would like @matthewkeil to have a close look to make sure we don't miss anything

@matthewkeil
Copy link
Copy Markdown
Member

this PR misses the context of #6938:

in the end it'd be more about reverting #6938, and we should deploy on a feature group to make sure there is no regression also would like @matthewkeil to have a close look to make sure we don't miss anything

@twoeths It looks like you covered changes that will get reverted from #6938. The gist Rahul will be that there should be no shuffling cache on the epochCtx anymore. I think if you remove that from the EpochCache and follow all the codepaths that break from the change you will find most of the other hanging references not mentioned above.

Comment thread packages/beacon-node/src/chain/blocks/importBlock.ts Outdated
Comment thread packages/beacon-node/src/chain/chain.ts Outdated
Comment thread packages/beacon-node/src/chain/shufflingCache.ts Outdated
Comment thread packages/beacon-node/src/chain/shufflingCache.ts Outdated
Comment thread packages/state-transition/src/cache/epochCache.ts Outdated
Comment thread packages/state-transition/src/cache/epochCache.ts Outdated
Comment thread packages/state-transition/src/cache/epochCache.ts Outdated
Comment on lines +120 to +125
const parentEpoch = computeEpochAtSlot(parentBlockSlot);
if (parentEpoch < blockEpoch && postState.epochCtx.nextShuffling !== null) {
// current epoch and previous epoch are likely cached in previous states
this.shufflingCache.set(postState.epochCtx.nextShuffling, postState.epochCtx.nextDecisionRoot);
this.logger.verbose("Processed shuffling for next epoch", {parentEpoch, blockEpoch, slot: blockSlot});
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally this happened right before the epoch work. I think it should be ok doing it early here but for consistency sake can we move this down to just before the line if (blockSlot % SLOTS_PER_EPOCH === 0) { in case there are error cases that will cause us to exit the import early. We only cache a limited number of shufflings and will be better to not cache ones from imports that fail.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, should stay where it was originally

}) ??
// Only for testing. shufflingCache should always be available in prod
computeEpochShuffling(state, cache.nextShufflingActiveIndices, epoch);
const shuffling = computeEpochShuffling(state, cache.nextShufflingActiveIndices, epoch);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we cache this result when the state comes back out of the state transition in all cases? I think this will be the case but please check for places where processSlot or stateTransition is in the call stack and verify we are doing that.

Copy link
Copy Markdown
Member

@matthewkeil matthewkeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps its better to do all calculations via the cache and that way there is no accidental duplication. Seems like an easier pattern to always use the build method instead of having free floating calls to computeEpochShuffling. Will keep everything centralized and coordinated. What do you think @twoeths ??

Comment thread packages/beacon-node/src/metrics/metrics/lodestar.ts
@guha-rahul guha-rahul requested a review from twoeths January 16, 2026 08:50
Comment thread packages/beacon-node/src/chain/shufflingCache.ts Outdated
Comment thread packages/state-transition/src/cache/epochCache.ts Outdated
@twoeths
Copy link
Copy Markdown
Contributor

twoeths commented Jan 17, 2026

@guha-rahul this e2e test consistently failed

[E2E tests: packages/beacon-node/test/e2e/api/impl/lightclient/endpoint.test.ts#L97](https://github.com/ChainSafe/lodestar/pull/8688/files#annotation_44242307082)
AssertionError: expected 4 to be 5 // Object.is equality - Expected + Received - 5 + 4 ❯ packages/beacon-node/test/e2e/api/impl/lightclient/endpoint.test.ts:97:47

also lint CI is failing

@guha-rahul guha-rahul requested a review from twoeths January 17, 2026 10:14
Comment thread packages/beacon-node/src/chain/chain.ts Outdated
@twoeths
Copy link
Copy Markdown
Contributor

twoeths commented Jan 20, 2026

the branch was deployed for >9h

Screenshot 2026-01-20 at 10 07 27
  • size, hit, miss rate are the same

  • set_rate_per_epoch is new in this PR, it ranges from ~7 to ~9 now which makes sense (it's very cheap check so it's not an issue setting the same shuffling there)

    • when import block, parent epoch is less than block epoch we set 3 times
    • when import_block, if slot is multiple of 32, we set 3 times. But there could be skip slots in this case
    • when regen process_slot, we set 3 times
  • at start of each epoch, we query ShufflingCache for attester duties. It works as before

Screenshot 2026-01-20 at 10 12 42
  • we also get shuffling from ShufflingCache to validate gossip attestations, it works as before
    • decision root was queried from fork-choice, and we still can get shuffling which proves the decision root was computed correctly inside the cache
Screenshot 2026-01-20 at 10 17 49 Screenshot 2026-01-20 at 10 20 16

buckets: [0.5, 1, 1.5, 2],
}),
shufflingCalculationTime: register.histogram<{source: "build" | "getSync"}>({
shufflingCalculationTime: register.histogram({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that we don't have this metric anymore so need to remove it

  • pre-fulu this metric was useful to know how much time we save
  • post-fulu we have to compute shuffling for epoch n+2 so we don't have that saving time. I suppose that takes up the processProposerLookahead time

twoeths
twoeths previously approved these changes Jan 20, 2026
Comment thread packages/beacon-node/src/metrics/metrics/lodestar.ts
Comment thread pnpm-lock.yaml
matthewkeil
matthewkeil previously approved these changes Jan 22, 2026
Copy link
Copy Markdown
Member

@matthewkeil matthewkeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!! 🚀

Metrics are all looking good

@matthewkeil matthewkeil merged commit cc77fcf into ChainSafe:unstable Jan 22, 2026
19 of 23 checks passed
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.06%. Comparing base (b79f41e) to head (9aad853).
⚠️ Report is 21 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #8688      +/-   ##
============================================
+ Coverage     52.02%   52.06%   +0.04%     
============================================
  Files           848      848              
  Lines         64650    64480     -170     
  Branches       4767     4757      -10     
============================================
- Hits          33632    33572      -60     
+ Misses        30949    30839     -110     
  Partials         69       69              
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wemeetagain
Copy link
Copy Markdown
Member

🎉 This PR is included in v1.40.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor ShufflingCache

5 participants